Neural Network and Deep Learning (Practice)
Econ 425T / Biostat 203B
1 Learning sources
This lecture draws heavily on following sources.
Deep Learning with Python by Francois Chollet.
Learning Deep Learning lectures by Dr. Qiyang Hu (UCLA Office of Advanced Research Computing): https://github.com/huqy/deep_learning_workshops
2 Software
High-level software focuses on user-friendly interface to specify and train models.
Keras, PyTorch, scikit-learn, …Lower-level software focuses on developer tools for implementing deep learning models.
TensorFlow, PyTorch, CNTK, Theano (stopped development!), Caffe, Torch, …Most tools are developed in Python plus a low-level language (C/C++, CUDA).
Source: https://www.simplilearn.com/keras-vs-tensorflow-vs-pytorch-article
3 TensorFlow
Developed by Google Brain team for internal Google use. Formerly DistBelief.
Open sourced in Nov 2015.
OS: Linux, MacOS, and Windows (since Nov 2016).
GPU support: NVIDIA CUDA.
TPU (tensor processing unit), built specifically for machine learning and tailored for TensorFlow.
Mobile device deployment: TensorFlow Lite (May 2017) for Android and iOS.
TensorFlow supports distributed training.
TensorFlow does not support Apple Silicon (M1/M2) directly, but Apple provides the
tensorflow-macospackage for running on M1/M2 GPUs.Used in variety of Google apps: speech recognition (Google assistant), Gmail (Smart Reply), search, translate, self-driving car, …
when you have a hammer, everything looks like a nail.
4 Workflow for a deep learning network
4.1 Step 1: Data ingestion, preparation, and processing
Source: CrowdFlower
The most time-consuming but the most creative job. Take >80% time, require experience and domain knowledge.
Determines the upper limit for the goodness of DL.
Garbage in, garbage out.For structured/tabular data.
Data prep for special DL tasks.
Image data: pixel scaling, train-time augmentation, test-time augmentation, convolution and flattening.
Data tokenization: break sequences into units, map units to vectors, align and pad sequences.
Data embedding: sparse to dense, merge diverse data, preserve relationship, dimension reduction, Word2Vec, be part of model training.
4.2 Step 2: Select neural network
- Architecture.
Source: https://www.asimovinstitute.org/neural-network-zoo/
- Activation function.
4.3 Step 3: Select loss function
Regression loss: MSE/quadratic loss/L2 loss, mean absolute error/L1 loss.
Classification loss: cross-entropy loss, …
Customized losses.
4.4 Step 4: Train and evaluate model
Choose optimization algorithm. Generalization (SGD) vs convergence rate (adaptive).
A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam) by Lili Jiang: https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c
- Fitness of model: underfitting vs overfitting.
Source: https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
- Model selection: \(K\)-fold cross validation.
5 Keras examples
Following are selected examples from the collection of Keras code examples.